Skip to main content

Selector Specification

JSON Selector Structure

There are a lot of different selectors types. The set of properties for the selector depends on its type.

Common Fields in Selector Descriptor

propertytyperequireddescription
selectorTypeenumThe selector type which is used to determine the set of other properties.
Possible valuesAvailable in user modeNotes
paragraph Paragraph selector
boundary Search areaUsed in Custom
relativeBoundary Relative boundary selector
dynamicCrop Crop selector
reject Filter selectorUsed in negative conditions
selectorBench Filter selectorUsed in positive conditions
fontFamily@Deprecated
fontSize@Deprecated
fontColor Font color selector
fontStyle@Deprecated
font Font selector
align Align selector
regExp Regular expression selector
pattern Pattern finder selector
iban IBAN selector
price Price selector
date Date selector
vat VAT selector
integer Integer selector
time Time selector
line Line selector
tableCluster Table selectorUsed when Cluster algorithm is selected
table Table selectorUsed when Auto algorithm is selected
tableFreq
pick Pick by index selector
page Search areaUsed in Page and Custom
image Image selector
barcode Barcode selector
groupByTb Grouping selector

Selector descriptors

Paragraph Selector

Paragraph Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user modeTemplate schema version
lineSpacingenum

Defines line spacing coefficient. Possible values:

NORMAL

LARGE

HUGE

Possible values:

Normal

Large

Huge

paragraphNamestring

@Deprecated use paragraphNames instead.

Defines paragraph name.

[1.0.0,1.2.0)
paragraphNamesList<String>Defines list of paragraph names.[1.2.0,)
runningTextboolean

Affecting the recognition result view. If false the resulted recognized paragraph would contain lines joined by new line character. If true then lines would be joined by single space character.

[1.1.1,)
excludeParagraphNameboolean

Affecting recognition result view. If true paragraph name would be excluded from resulted recognized paragraph. If false paragraph name would be included in resulted recognized paragraph

[1.2.0,)
Min Paragraph Selector Example
Click to expand json
      {
"selectorType":"paragraph",
"lineSpacing":"LARGE"
}

Full Paragraph Selector Example

Click to expand json
{
"selectorType": "paragraph",
"lineSpacing": "NORMAL",
"paragraphNames": [
"My Paragraph",
"Second paragraph"
],
"runningText": true,
"excludeParagraphName": false
}

Boundary Selector

Boundary Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
topUsedbooleanDefines whether the top boundary shall be used.
bottomUsedbooleanDefines whether the bottom boundary shall be used.
leftUsedbooleanDefines whether the left boundary shall be used.
rightUsedbooleanDefines whether the right boundary shall be used.
areaRectangleDefines the location of boundary selector.

At least one of the properties topUsed, bottomUsed, leftUsed, rightUsed shall be true. If all of them are false, then exception will be thrown.

Boundary Selector Example

Click to expand json
{
"selectorType": "boundary",
"topUsed": true,
"bottomUsed": false,
"leftUsed": true,
"rightUsed": true,
"area": {
"left": 11.0,
"right": 111.0,
"top": 111.0,
"bottom": 11.0
}
}

Relative Boundary Selector

Relative Boundary Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
sideenum

Defines the side for relation. Possible values:

LEFT

RIGHT

TOP

BOTTOM

Default value: LEFT.

Possible values:

Left

Right

Top

Bottom

Min Relative Boundary Selector Example
Click to expand json
{
"selectorType": "relativeBoundary"
}

Full Relative Boundary Selector Example

Click to expand json
{
"selectorType": "relativeBoundary",
"side": "RIGHT"
}

Crop Selector

Crop Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
cropVerticallyenum

Determines which way relative to crop area vertical crop will be performed. Possible values:

BEFORE

AFTER

areaIndexinteger

Defines which area to take as the beginning of crop. Non-zero. 1-based. negative values means counting from the end.

Default: 1.

areaSelectorslist<Selector>

Internal selectors used to search for cropArea.

The list should be present and not empty.

Min Crop Selector Example

Click to expand json
{
"selectorType": "dynamicCrop",
"cropVertically": "BEFORE",
"areaSelectors": [
{
"selectorType" : "anySelector"
},
{
"selectorType" : "anySelector1"
},
]
}

Full Crop Selector Example

Click to expand json
{
"selectorType": "dynamicCrop",
"cropVertically": "BEFORE",
"areaIndex": 1,
"areaSelectors": [
{
"selectorType" : "anySelector"
},
{
"selectorType" : "anySelector1"
},
]
}

Reject Selector

Reject Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
rejectBylist<Selector>

Internal selectors used to define data for reject.

The list should be present and not empty.

Reject Selector Example

Click to expand json
{
"selectorType": "reject",
"rejectBy": [
{
"selectorType" : "anySelector"
},
{
"selectorType" : "anySelector1"
},
]
}

SelectorBench Selector

SelectorBench Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
subTypeString

SubType of selector bench.

selectorslist<Selector>

Internal selectors of selector bench.

The list is optional. If it is null or empty, then selector bench does not affect recognition.

Min SelectorBench Selector Example

Click to expand json
{
"selectorType": "selectorBench",
}

Full SelectorBench Selector Example

Click to expand json
{
"selectorType": "selectorBench",
"subType": "filter",
"selectors": [
{
"selectorType" : "anySelector"
},
{
"selectorType" : "anySelector"
},
{
"selectorType" : "anySelector"
}
]
}

Font Selectors

Font Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user modeTemplate schema version
fontFamilyFontFamily

@Deprecated use family instead.

The font family descriptor to select.
[1.0.0, 1.4.0)
fontSizeFontSize

@Deprecated use size instead.

The font size descriptor to select
[1.0.0, 1.4.0)
fontStyleFontStyle

@Deprecated use fontStyles instead.

The font style descriptor to select
[1.0.0, 1.4.0)
familyString

Font name to select

[1.4.0,)
size

One of:

StaticType RangeType

The font size interval with double values.

Static interval means selection of precise font size. Range interval means selection of font size in range.

[1.4.0,)
fontStyleslist<enum>

The font styles list to select.

Possible values in list:

NORMAL

BOLD

ITALIC

BOLD_ITALIC

Possible values in list:

Normal

Bold

Italic

Bold italic

[1.4.0,)

From template schema version 1.4.0 at least one of the properties family, size, fontStyles should be present.

Min Font Selector Example with family

Click to expand json
{
"selectorType": "font",
"family": "Calibri"
}

Min Font Selector Example with static size

Click to expand json
{
"selectorType": "font",
"size": {
"structureType": "static",
"value": "10.0"
}
}

Min Font Selector Example with range size

Click to expand json
{
"selectorType": "font",
"size": {
"structureType": "range",
"min": "9.3",
"max": "15.3",
}
}

Min Font Selector Example with font styles

Click to expand json
{
"selectorType": "font",
"fontStyles": ["AUTO", "NORMAL", "BOLD_ITALIC"]
}

Full Font Selector Example

Click to expand json
{
"selectorType": "font",
"family": "Times",
"size": {
"structureType": "static",
"value": "12.3"
},
"styles": [
"NORMAL",
"BOLD_ITALIC"
]
}

FontColor Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
colorstring (rgb format)

The font color. If missing, then auto-evaluated into font color from reference PDF.

The format is #rrggbb , where rr, gg, bb are hex representations of corresponding color value.

The format is #rrggbb

tolerancedouble

Comparison tolerance.

Default value: 0.

Min FontColor Selector Example
Click to expand json
{
"selectorType": "fontColor",
"color": "#ffffff"
}

Full FontColor Selector Example

Click to expand json
{
"selectorType": "fontColor",
"color": "#ffffff",
"tolerance": 5.1
}

FontFamily Selector Descriptor

@Deprecated Use Font Selector with setting family instead.

propertytyperequireddescriptionAvailable parameters in user mode
fontNamestringThe font name.

FontFamily Selector Example

Click to expand json
{
"selectorType": "fontFamily",
"fontName": "Times"
}

FontSize Selector Descriptor

@Deprecated Use Font Selector with setting size instead.

propertytyperequireddescriptionAvailable parameters in user mode
size

One of:

StaticType RangeType

The font size interval with double values.

Static interval means selection of precise font size. Range interval means selection of font size in range.

If missing, then Static interval would be evaluated from reference PDF.

FontSize with Static Interval Selector Example

Click to expand json
{

"selectorType": "fontSize",
"size": {
"structureType": "static",
"value": "12.3"
}
}

FontSize with Range Interval Selector Example

Click to expand json
{
"selectorType": "fontSize",
"size": {
"structureType": "range",
"min": "9.3",
"max": "15.3",
}
}

FontStyle Selector Descriptor

@Deprecated Use Font Selector with setting styles instead.

propertytyperequireddescriptionAvailable parameters in user mode
styleslist<enum>

The font styles list to select.

Possible values in list:

NORMAL

BOLD

ITALIC

BOLD_ITALIC

Possible values in list:

Normal

Bold

Italic

Bold italic

Min FontStyle Selector Example
Click to expand json
{
"selectorType": "fontStyle"
}

Full FontStyle Selector Example

Click to expand json
{
"selectorType": "fontStyle",
"styles": ["AUTO", "NORMAL", "BOLD_ITALIC"]
}

Align Selector

Align Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
alignFilterenum

Defines which align to use. Possible values:

LEFT

RIGHT

leftdouble

Defines left align boundary.

rightdouble

Defines right align boundary.

Align Selector Example

Click to expand json
{
"selectorType": "align",
"alignFilter": "RIGHT",
"left": 11.3,
"right": 222.3
}

RegExp Based Selector Descriptors

RegExp Selector Descriptor

Selector for extracting text suitable for specified patterns.

propertytyperequireddescriptionAvailable parameters in user mode
patternslist<string>

Defines list of patterns to match. The list shall not be empty.

selectLineinteger

Defines the line index to select. Shall be less or equal to patterns list size and more or equal to 1.

Default: patterns size.

checkLocationboolean

Defines whether the check location should be applied.

If false, then all regular expression search results will be added into selector results. If true, then only the regular expression results which fit into the provided left and right thresholds would be added as selector results.

Default: false.

leftThresholddouble

Defines the location left threshold for check location functionality.

Default: 0

rightThresholddouble

Defines the location right threshold for check location functionality.

Default: 0

Min RegExp Selector Example
Click to expand json
{
"selectorType": "regExp",
"patterns": ["Account number", "\d+"]
}

Full RegExp Selector Example

Click to expand json
{
"selectorType": "regExp",
"patterns": ["Account number", "\d+", "@@date"],
"selectLine": 2,
"checkLocation": true,
"leftThreshold": 2.1,
"rightThreshold": 3.1
}

Pattern Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user modeTemplate schema version
prefixstring

@Deprecated use prefixes instead.

Defines pattern prefix.

Default: no prefix, i.e. empty string.

[1.0.0, 1.2.0)
suffixstring

@Deprecated use suffixes instead.

Defines pattern suffix.

Default: no suffix, i.e. empty string.

[1.0.0, 1.2.0)
prefixesList<String>

Defines pattern prefix list.

Default: no prefixes, i.e empty list.

[1.2.0,)
suffixesList<String>

Defines pattern suffix list.

Default: no suffixes, i.e empty list.

[1.2.0,)
innerBasePattern

One of:

Integer Selector

IBAN Selector

Price Selector

VAT Selector

Time Selector

Date Selector

Defines pattern middle part if needed.
Min Pattern Selector Example
Click to expand json
{
"selectorType": "pattern"
}

Full Pattern Selector Example

Click to expand json
{
"selectorType": "pattern",
"prefixes": [
"Expected IBAN: ",
"Expected DATE: "
],
"suffixes": [
" is valid.",
" empty."
],
"innerBasePattern": {
"selectorType": "iban",
"evaluatedPattern": "(?<iban>[A-Z]{2}\d{2}\s\*(\d{4}\s\*)+(\d{1,4}))"
}
}

Simple Pattern Based Selectors

The simple Pattern based selectors are:

  • iban
  • price
  • vat
  • time
  • integer
propertytyperequireddescriptionAvailable parameters in user mode
evaluatedPatternstring

Defines the pattern to use.

Auto-generated using select tool

IBAN Selector Example

Click to expand json
{
"selectorType": "iban",
"evaluatedPattern": "(?<iban>[A-Z]{2}\d{2}\s\*(\d{4}\s\*)+(\d{1,4}))"
}

Price Selector Example

Click to expand json
{
"selectorType": "price",
"evaluatedPattern": "(?<price>\d{1,3}(,\d{3})\*(\.\d{1,2})?)\s\*\$"
}

VAT Selector Example

Click to expand json
{
"selectorType": "vat",
"evaluatedPattern": "AT\s\*U[0-9]{8}"
}

Time Selector Example

Click to expand json
{
"selectorType": "time",
"evaluatedPattern": "(?<{1}>(1[012]|[1-9]){0}[0-5][0-9](\s)?(?i)(am|pm|a.m.|p.m.|AM|PM|A.M.|P.M.))"
}

Integer Selector Example

Click to expand json
{
"selectorType": "integer",
"evaluatedPattern": "\d+"
}

Date Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
patternstring

The user-friendly date pattern, which can be additionally set.

evaluatedPatternstringEvaluated pattern to be used in processing.

Auto-evaluated in the Editor from the document selection or based on pattern

Min Date Selector Example
Click to expand json
{
"selectorType": "date",
"evaluatedPattern": "(?:[1-9]|0[1-9]|[1-2][0-9]|3[0-1])-(?:[1-9]|0[1-9]|1[0-2])-\d{4}"
}

Full Date Selector Example

Click to expand json
{
"selectorType": "date",
"pattern": "dd-MM-yyyy",
"evaluatedPattern": "(?:[1-9]|0[1-9]|[1-2][0-9]|3[0-1])-(?:[1-9]|0[1-9]|1[0-2])-\d{4}"
}

Line Selector

Line Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
charSpacingenum

Defines line spacing coefficient. Possible values:

NORMAL

LARGE

HUGE

Possible values:

Normal

Large

Huge

Line Selector Example
Click to expand json
{
"selectorType": "line",
"charSpacing": "NORMAL"
}

Cluster Table Selector

Cluster Table Selector Descriptor

propertytyperequireddescription
headerslist<string>Defines headers of the table to match.
determiningColumnintegerDefines the determining column for cluster algorithm.
row

One of:

StaticType

RangeType

Specified single row or an interval of rows for match. Static and Range type's values expected to be strings with integer value.

If missing range from first row to last will be used.

column

One of:

StaticType

RangeType

NamedType

Specified single column or an interval of columns for match. Static and Range type's values expected to be strings with integer value. NamedType matches the column by its name. If missing range from first row to last will be used.

If missing range from first column to last will be used.

formatenum

Defines the specified headers format. Possible values:

SIMPLE

REGEXP

Default value: SIMPLE.

Min Cluster Table Selector Example
Click to expand json
{
"selectorType": "tableCluster",
"headers": ["header 1", "header 2"],
"determiningColumn": 0
}

Full Cluster Table Selector Example

Click to expand json
{
"selectorType": "tableCluster",
"headers": ["header 1", "header 2"],
"determiningColumn": 2,
"row": {
"structureType": "range",
"min": "1",
"max": "10"
},
"column": {
"structureType": "range",
"min": "1",
"max": "2"
},
"format": "REGEXP"
}

Table Selector

Table Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
headerslist<string

Defines headers of the table to match.

determiningColumnintegerDefines the determining column for cluster algorithm.
row

One of:

StaticType

RangeType

Specified single row or an interval of rows for match. Static and Range type's values expected to be strings with integer value.

If missing range from first row to last will be used.

column

One of:

StaticType

RangeType

NamedType

Specified single column or an interval of columns for match. Static and Range type's values expected to be strings with integer value. NamedType matches the column by its name.

If missing range from first column to last will be used.

formatenum

Defines the specified headers format. Possible values:

SIMPLE

REGEXP

Default value: SIMPLE.

Available as toggle button "Allow regular expressions".

Possible values: OFF, ON

Min Table Selector Example
Click to expand json
{
"selectorType": "table",
"headers": ["header 1", "header 2"],
"determiningColumn": 2,
}

Full Table Selector Example

Click to expand json
{
"selectorType": "table",
"headers": ["header 1", "header 2"],
"determiningColumn": 2,
"row": {
"structureType": "range",
"min": "1",
"max": "10"
},
"column": {
"structureType": "range",
"min": "1",
"max": "2"
},
"format": "REGEXP"
}

Frequency Table Selector

Frequency Table Selector Descriptor

propertytyperequireddescription
frequencyAreaRectangle

Defines the rectangle for frequency algorithm.

row

One of:

StaticType

RangeType

Specified single row or an interval of rows for match. Static and Range type's values expected to be strings with integer value.

If missing range from first row to last will be used.

column

One of:

StaticType

RangeType

NamedType

Specified single column or an interval of columns for match. Static and Range type's values expected to be strings with integer value. NamedType matches the column by its name.

If missing range from first column to last will be used.

Min Frequency Table Selector Example

Click to expand json
{
"selectorType": "tableFreq",
"frequencyArea": {
"left": 11,
"right": 111,
"top": 111,
"bottom": 11
}
}

Full Frequency Table Selector Example

Click to expand json
{
"selectorType": "tableFreq",
"frequencyArea": {
"left": 11,
"right": 111,
"top": 111,
"bottom": 11
},
"row": {
"structureType": "range",
"min": "1",
"max": "10"
},
"column": {
"structureType": "range",
"min": "1",
"max": "2"
}
}

Picker Selector

Picker Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user modeTemplate schema version
index

One of:

StaticType

RangeType

@Deprecated use indexes instead.

Specified range for match. Static and Range type's values expected to be strings with integer value.

[1.0.0-1.3.0)
indexesList<RangeGroup>List of range groups specifying elements to select. The element would be selected if it's index matches at least one range in the list. Empty list leads to no elements to be selected.[1.3.0,)
groupTypeenum

Defines grouping type. Possible values:

SAME

BUNCH

LINE

PARAGRAPH

Default value: SAME

Possible values:

Leave Unchanged

Character

Line

Paragraph

Min Picker Selector Example
Click to expand json
{
"selectorType": "pick",
"indexes": []
}

Full Picker Selector Example

Click to expand json
{
"selectorType": "pick",
"indexes": [
{
"start": 2,
"end": -30,
"segmentSize": 3,
"segmentGap": 2
},
{
"start": -40,
"end": 55,
"segmentSize": 11,
"segmentGap": 1
}
],
"groupType": "LINE"
}

Page Selector

Page Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
pageNumber

One of:

StaticType

RangeType

Specified pages for match. Static and Range type's values expected to be strings with integer value. For range interval both values shall be present and non-zero.
Full Page Selector With Single Page Example
Click to expand json
{
"selectorType": "page",
"pageNumber": {
"structureType": "static",
"value": "1"
}
}

Full Page Selector With Page Range Example

Click to expand json
{
"selectorType": "page",
"pageNumber": {
"structureType": "range",
"min": "1",
"max": "10"
}
}

Image Selector

Image Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user modeTemplate schema version
index

One of:

StaticType

RangeType

@Deprecated use indexes instead.

Specified range for match. Static and Range type's values expected to be strings with integer value. Indexes can't be zero.

[1.0.0-1.3.0)
indexesList<RangeGroup>List of range groups specifying images to select. The image would be selected if it's index matches at least one range in the list. Empty list leads to no images to be selected.[1.3.0,)
width

One of:

StaticType

RangeType

Specifies image width to match. The values in static type and range type mast be either an empty string (indicating that no limit on minimum/maximum), or an integer with optional suffix one of pt, in, cm, mm. Examples: "", "5", "5pt", "5in", "5cm", "5mm". If suffix is missing, then the default "pt" would be used.
height

One of:

StaticType

RangeType

Specifies image height to match. The values in static type and range type mast be either an empty string (indicating that no limit on minimum/maximum), or an integer with optional suffix one of pt, in, cm, mm. Examples: "", "5", "5pt", "5in", "5cm", "5mm". If suffix is missing, then the default "pt" would be used.

Image Selector Example

Click to expand json
{
"selectorType": "image",
"indexes": [
{
"start": 2,
"end": -30,
"segmentSize": 3,
"segmentGap": 2
},
{
"start": -40,
"end": 55,
"segmentSize": 11,
"segmentGap": 1
}
],
"width": {
"structureType": "range",
"min": "",
"max": "10cm"
},
"height": {
"structureType": "range",
"min": "1pt",
"max": ""
}
}

Barcode Selector

Barcode Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
formatenum

Defines barcode format to match. Possible values:

- ALL

- ALL_1D

- ALL_2D

- AZTEC

- CODABAR

- CODE_39

- CODE_93

- CODE_128

- DATA_MATRIX

- EAN_8

- EAN_13

- ITF

- PDF_417

- QR

- RSS_14

- RSS_EXPANDED

- UPC_A

- UPC_E

Possible values:

- All

- All 1D

- All 2D

- Aztec

- Codabar

- Code39

- Code93

- Code128

- DataMatrix

- EAN8

- EAN13

- ITF

- PDF417

- QR

- RSS-14

- RSS-Expanded

- UPC-A

- UPC-E

barcodeLocationRectangle

Defines the location of barcode selector.

Barcode Selector Example

Click to expand json
{
"selectorType": "barcode",
"format": "ALL",
"barcodeLocation": {
"left": 11.0,
"right": 111.0,
"top": 111.0,
"bottom": 11.0
}
}

Group Selector

Group Selector Descriptor

propertytyperequireddescriptionAvailable parameters in user mode
parentNamestringDefines the string id for matching data fields with group selector.
Group Selector Example
Click to expand json
{
"selectorType": "groupByTb",
"parentName": "My ID"
}

Complex JSON Structures

Common Fields in Complex JSON Structures Descriptor

propertytyperequireddescription
structureTypeenum

The structure type which is used to determine the set of other properties.

Possible values:

- static

- range

- named

Static Type Descriptor

propertytyperequireddescription
valuestringFixed value.
Static Type Example
Click to expand json
{
"structureType": "static",
"value": "15"
}

Range Type Descriptor

propertytyperequireddescription
minstringMin interval value. Missing value means no minimum.
maxstringMax interval value. Missing value means no maximum.
Full FontSize with Static Interval Selector Example
Click to expand json
{
"structureType": "range",
"min": "9.3",
"max": "15.3"
}

Named Type Descriptor

propertytyperequireddescription
namestringName value.
Full FontColor Selector Example
Click to expand json
{
"structureType": "named",
"name": "Custom name"
}

Other Inner Structures

RangeGroup Descriptor

propertytyperequireddescription
startint

Range group start index.

Non zero. 1-based. negative values means counting from the end

Default: 1.

endint

Range group end index.

Non zero. 1-based. negative values means counting from the end

Default: -1.

segmentSizeint

Segment size. Specifies the amount of elements to pick up before the gape.

Should be positive.

Default value: 1.

segmentGapint

Segment gap. Specifies the amount of elements to skip before the next picked group segment.

Should be non negative.

Default value: 0.

Minimum example is a range group with all defaults. This means that all indexes are picked up.

Min RangeGroup Example

Click to expand json
{
}

Maximum example below uses all fields. For 1-based indexes in range [1,20] it will pick up indexes 2,3,4,7,8,9,12,13,14 , i.e. taking [start,end] range, splitting it into ranges of segmentSize + segmentGap and picking first segmentSize elements from each range.

Max RangeGroup Example

Click to expand json

{
"start": 2,
"end": 15,
"segmentSize": 3,
"segmentGap": 2
}